Implement postgres migration for contributor and leaderboard#403
Implement postgres migration for contributor and leaderboard#403codebestia wants to merge 3 commits intoSolFoundry:mainfrom
Conversation
1438f59 to
0a963a6
Compare
|
Caution Review failedPull request was closed or merged during review 📝 WalkthroughWalkthroughThis PR implements a comprehensive PostgreSQL migration for contributor and leaderboard services, replacing in-memory storage with SQLAlchemy ORM models ( Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related issues
Possibly related PRs
Suggested labels
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 26
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
backend/app/services/reputation_service.py (2)
394-415:⚠️ Potential issue | 🟠 MajorHistory persistence and aggregate score updates can commit independently.
Lines 401-407 recompute and persist the contributor’s aggregate
reputation_score, then Lines 410-415 persist the history entry in a separate step. If the history write fails, DB reads rebuilt from history no longer match the contributor row, and the inconsistency survives restarts becauseget_reputation()recomputes from history rather than the aggregate column.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/services/reputation_service.py` around lines 394 - 415, The current flow appends to the in-memory _reputation_store, updates the aggregate via contributor_service.update_reputation_score, then asynchronously persists the history with persist_reputation_entry, which can leave the aggregate out-of-sync if the history write fails; change the order and/or use a single transactional persist so history is durable before committing the aggregate: call persist_reputation_entry(entry) first (or wrap both persist_reputation_entry and contributor_service.update_reputation_score in a DB transaction), only update _reputation_store and call contributor_service.update_reputation_score(round(total, 2)) after the history write succeeds, and handle exceptions by not updating the aggregate when persist_reputation_entry raises; also ensure get_reputation still recomputes from history consistently.
503-522:⚠️ Potential issue | 🟠 MajorThe reputation leaderboard does an O(N) fan-out of repeated DB/history loads.
Lines 503-522 call
get_reputation()once per contributor ID. Each call then re-fetches contributor data and reloads reputation history again, so leaderboard cost grows with contributor count and repeatedly scans the same history set instead of ranking in one query/pass.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/services/reputation_service.py` around lines 503 - 522, The current loop calls get_reputation(contributor_id, include_history=False) for each id, causing N repeated DB/history loads; instead implement a bulk path: add a new bulk function (e.g., get_reputations_bulk or extend get_reputation to accept a list) that takes all_ids and include_history flag and returns a list of ReputationSummary objects by fetching contributor rows and reputation histories in one query/pass, then replace the per-id loop in the leaderboard code (which currently calls contributor_service.list_contributor_ids and get_reputation) with a single call to that bulk function and then sort by (-reputation_score, username) and slice offset:offset+limit before returning.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/alembic.ini`:
- Around line 1-51: Resolve the Git conflict markers in the alembic.ini by
removing the "<<<<<<<", "=======", ">>>>>>>" blocks and keeping a single
coherent INI content under the [alembic] section (pick the correct
script_location value used by your project: either "migrations/alembic" or
"alembic"); also remove the hardcoded credentials in sqlalchemy.url (the
sqlalchemy.url key) and replace it with a safe placeholder or an empty fallback
(e.g., a driver://user:pass@host/dbname placeholder) so env.py can reliably read
DATABASE_URL at runtime, and ensure any logger/handler sections you keep are
valid INI syntax.
In `@backend/alembic/env.py`:
- Around line 28-36: The current fallback DATABASE_URL string with real-looking
credentials is unsafe; update env.py so database_url (the variable computed and
passed to config.set_main_option) does not default to "postgres:postgres".
Instead, if os.getenv("DATABASE_URL") is missing, either raise a clear runtime
error (fail fast) unless in a dev mode flag, or set a non-functional placeholder
value; implement this check before the startswith/replace logic so the code
either raises an exception with a helpful message or uses an explicit
INVALID_DATABASE_URL placeholder when not in development (use your existing
environment indicator or add one) and then call
config.set_main_option("sqlalchemy.url", database_url).
In `@backend/alembic/versions/001_create_contributors_table.py`:
- Around line 110-137: Add a database-level uniqueness constraint to the
reputation_history table to prevent duplicate rows for the same contributor and
bounty: update the op.create_table call for "reputation_history" (or add an
op.create_unique_constraint immediately after) to include a UniqueConstraint on
("contributor_id", "bounty_id") so the database enforces idempotency for
contributor_id + bounty_id and avoids race-condition inserts that the
reputation_service tries to deduplicate in application code.
In `@backend/app/api/contributors.py`:
- Around line 65-66: The current split of query params into skill_list and
badge_list returns raw tokens (e.g., ["python", " rust", ""]) which can break DB
filters; update the logic around skill_list and badge_list in contributors.py to
split on commas, trim whitespace from each token, drop any empty tokens, and
optionally normalize casing (e.g., lower()) before passing to DB; ensure you
still use parameterized queries or ORM filters to avoid injection risk when
using these cleaned lists.
- Around line 3-10: The file contains unresolved Git conflict markers (<<<<<<<,
=======, >>>>>>>) in the module docstring and inside multiple endpoint handlers
which prevents FastAPI from importing the router; remove all conflict markers
and merge the text so the module docstring is a single coherent sentence (choose
the intended description), then reconcile duplicated or conflicting handler
blocks by keeping the correct implementation for each endpoint (look for symbols
like router, any `@router.get/`@router.post/@router.put/@router.delete handlers,
and functions such as
get_contributor/create_contributor/update_contributor/delete_contributor or
reputation-related delegators) so there are no leftover markers or duplicate
definitions; after merging, ensure imports and the router variable are intact
and run the linter/tests to confirm FastAPI can import the module.
In `@backend/app/api/health.py`:
- Around line 52-58: The health endpoint currently calls
contributor_service.count_contributors() and accesses
bounty_service._bounty_store directly which can crash the endpoint; wrap the
contributor_service.count_contributors() call in a try/except to catch DB
errors, log the exception, and set a safe fallback (e.g., None or -1) and mark
the metric/state as degraded instead of letting the error propagate, and replace
direct access to _bounty_store by adding and using a public function on
bounty_service (e.g., get_bounty_count()) to obtain bounty metrics; keep
get_last_sync() usage as-is but ensure all failures are handled similarly so the
health response never raises an exception.
In `@backend/app/api/leaderboard.py`:
- Around line 43-45: The query parameter name "range: Optional[str] =
Query(...)" in leaderboard endpoint shadows Python's built-in range(); rename
the parameter (e.g., time_range: Optional[str] = Query(None,
description="Frontend range: 7d, 30d, 90d, all", alias="range")) and update all
uses inside the endpoint handler to use time_range (or chosen name); if you need
to keep the external query param as "range" use Query's alias argument as shown
so OpenAPI/client behavior is unchanged, and update any tests/docs that
reference the old name.
In `@backend/app/models/contributor.py`:
- Around line 95-110: ReputationHistoryDB lacks a uniqueness constraint for the
idempotency key used by record_reputation(), so duplicate (contributor_id,
bounty_id) rows can be inserted; add a composite UNIQUE constraint (or
UniqueConstraint) on (contributor_id, bounty_id) in the ReputationHistoryDB
model and update the DB migration so the database enforces idempotency at
write-time; ensure any existing code that upserts or catches IntegrityError in
record_reputation() handles duplicate-key errors gracefully after the migration.
In `@backend/app/seed_leaderboard.py`:
- Around line 100-106: seed_leaderboard currently uses
asyncio.get_event_loop().run_until_complete(async_seed_leaderboard()) which will
break when called from an existing event loop; change seed_leaderboard() into an
async def seed_leaderboard() that simply awaits async_seed_leaderboard(), and
update both call sites (where main.py's lifespan() and seed_database.py's
seed_all() invoke seed_leaderboard) to use await seed_leaderboard() instead of
calling it synchronously or using asyncio.run()/run_until_complete().
In `@backend/app/services/contributor_service.py`:
- Around line 181-224: The SELECT used to produce paginated contributor results
(base_query used to compute rows_result with .offset(skip).limit(limit)) lacks
an ORDER BY, causing nondeterministic paging; modify the query-building in the
contributor list flow to apply a deterministic order (e.g., base_query =
base_query.order_by(ContributorTable.id) or by a timestamp like
ContributorTable.created_at DESC) before applying offset/limit, and keep
count_query unchanged; ensure the same order_by is used consistently so paging
returns stable, repeatable pages.
- Around line 141-149: The insert path still allows a TOCTOU race to surface as
a 500 because commit/flush errors aren’t translated; wrap both await
session.flush() and the auto_session commit/refresh block in try/except that
catches SQLAlchemy integrity/unique constraint exceptions (e.g.
sqlalchemy.exc.IntegrityError or DBAPI-specific unique-violation), detect the
constraint for contributors.username, and raise/return the existing
duplicate/409 error (or translate to the same DuplicateUsername/HTTPConflict
used elsewhere) instead of letting the exception bubble; apply this change
around the symbols session.flush(),
async_session_factory()/auto_session.commit(), and auto_session.refresh(row).
- Around line 424-460: update_reputation_score mutates
ContributorTable.reputation_score but never invalidates the leaderboard cache
(leaderboard_service.get_leaderboard), causing stale reads; fix by calling the
leaderboard cache invalidation API after a successful update: invoke
leaderboard_service.invalidate_leaderboard() (or the existing cache invalidation
function) immediately after the update when using an external session (after
db_session.flush() where row.reputation_score is set) and also after the
auto-commit path (right after await auto_session.commit()); ensure the
invalidation call only runs when a row was actually modified.
In `@backend/app/services/github_sync.py`:
- Line 464: The skills list is being silently truncated with "skills[:10]" which
can lose contributor data; update the code around that assignment (the place
building the payload/dict with "skills": skills[:10]) to first check if
len(skills) > 10, and if so emit a warning using the module/service logger
(e.g., logger.warning or self.logger.warning) including the contributor
identifier (e.g., contributor_id or login) and the original count, then set the
field to skills[:10]; alternatively document the 10-item limit in the
function/class docstring or payload schema if truncation is intentional.
- Around line 458-473: The upsert_contributor calls inside the sync loop must be
executed inside a single DB transaction to avoid partial commits; modify the
sync routine to open a transaction/session before the loop (or use
contributor_service.transaction/context manager if available), perform all
upsert_contributor(...) calls within that transaction, and commit once all
upserts succeed (or rollback on error), and ensure refresh_store_cache() runs
only after commit; alternatively, catch exceptions around the transaction to log
failures and avoid leaving synced_count partial state.
- Around line 470-471: The snippet is forcibly setting "created_at": now -
timedelta(days=45) on every sync which overwrites real creation timestamps for
new contributors; change the logic in the sync path that prepares contributor
payloads so it only sets created_at for true seeding (or when the DB record is
absent) and otherwise derives created_at from the contributor's actual earliest
activity (e.g., earliest PR/commit date) or omits the field so the database
default/previous value remains; update the code that calls upsert_contributor()
to pass the computed earliest_date (or no created_at) instead of the fixed now -
timedelta value and add a short comment documenting this behavior.
In `@backend/app/services/leaderboard_service.py`:
- Around line 175-177: The period cutoff is being applied to
ContributorTable.created_at which removes users created before the cutoff
instead of limiting the earnings/reputation used for ranking; change the logic
so the cutoff filters activity/earnings entries used to compute total_earnings
and reputation_score (e.g., apply _period_cutoff(period) to the
earnings/activity table timestamp column or to the subquery/aggregation that
produces total_earnings and reputation_score) rather than
ContributorTable.created_at so rankings at lines referencing total_earnings and
reputation_score reflect only the period window.
- Around line 109-112: The code is truncating floating reputation values by
casting row.reputation_score to int; update the response construction to
preserve the float (e.g., use float(row.reputation_score or 0.0)) wherever
reputation_score is currently cast to int (replace int(...) in the blocks that
build leaderboard entries and any other occurrence around reputation_score),
ensuring the API returns the full float value that matches the persisted/ranked
value.
In `@backend/app/services/reputation_service.py`:
- Around line 3-24: The file contains unresolved Git conflict markers (<<<<<<<,
=======, >>>>>>>) in the module docstring, helper docstrings, and multiple
function bodies which makes the module unimportable; open
backend/app/services/reputation_service.py, remove all conflict markers and
choose the correct content for each conflicted section (e.g., unify the module
docstring and the import block to either include logging and threading or
asyncio as needed for the current implementation), resolve each function body
conflict by keeping the intended logic (preserve calls to
contributor_service.update_reputation_score, reputation history handling, and
any async vs sync signatures), ensure imports match the chosen implementation
(replace or keep asyncio/logging/threading), run the linter/tests to verify
parsing and runtime behavior, and commit the cleaned file with no remaining
<<<<<<</======/>>>>>> markers.
In `@backend/requirements.txt`:
- Line 7: The requirements.txt contains a duplicate dependency entry for
"alembic>=1.13.0,<2.0.0"; remove the redundant line so the package is declared
only once (keep a single "alembic>=1.13.0,<2.0.0" entry) to eliminate
maintenance confusion and potential inconsistencies when updating dependencies.
In `@backend/scripts/seed_contributors_from_github.py`:
- Around line 111-136: The pagination loop that calls client.get and checks
response.status_code should implement retries with exponential backoff for
transient errors (handle 429 and 5xx) instead of immediately breaking; augment
the block around response = await client.get(url, headers=_headers(),
params=params) to retry a configurable number of times with increasing sleep,
inspect response.headers.get("X-RateLimit-Remaining") and sleep until reset or
backoff when remaining is low, and if retries are exhausted or a non-retriable
code is returned (e.g., persistent 4xx) log the full response and raise an
exception (or return a failure) rather than break to avoid silent partial data;
keep variables/flow using response, client.get, _headers(), params, per_page,
page, all_prs, and logger so you modify the existing loop and error handling
only.
- Around line 224-228: The bot-filtering is brittle: update the logic around the
author variable (where avatar is also derived) to perform case-insensitive
checks and broader bot matching; e.g., compute author_lower = author.lower() and
then skip when author_lower.endswith("[bot]") or "bot" in author_lower or
author_lower is in an expanded set like
{"dependabot","github-actions","renovate","snyk-bot","codecov"} (or use a small
regex for common bot patterns) so that the existing check using
endswith("[bot]") and the tuple ("dependabot","github-actions") is replaced with
this normalized, extensible test.
In `@backend/tests/conftest.py`:
- Around line 21-39: The custom global loop and accessor (_test_loop and
get_test_loop) duplicate pytest-asyncio behavior and have a wrong type hint;
change the annotation to Optional[asyncio.AbstractEventLoop] for _test_loop and
either remove this custom loop entirely or replace it with a proper pytest
fixture named event_loop that returns a new event loop for tests (using
asyncio.new_event_loop() and yielding/closing it) so all async tests and helpers
(e.g., run_async) use pytest-asyncio's loop; update call sites of get_test_loop
to use the pytest event_loop fixture or accept the fixture via parameters and
delete the global state and set_event_loop calls to avoid conflicts.
In `@backend/tests/test_contributors.py`:
- Around line 1-6: Update the module docstring in
backend/tests/test_contributors.py to accurately describe the test setup:
replace the incorrect "Uses an in-memory SQLite database for test isolation"
with a statement that the tests use the PostgreSQL engine provided by
app.database (the async PostgreSQL-backed contributor service). Keep the rest of
the docstring intact but ensure it references the PostgreSQL engine from
app.database rather than SQLite.
In `@backend/tests/test_leaderboard.py`:
- Around line 64-66: Remove the manual cache write to contributor_service._store
after calling upsert_contributor; instead invoke the service's cache-refresh
logic (call contributor_service.refresh_store_cache() after
run_async(contributor_service.upsert_contributor(row_data))) so the test uses
the same cache-population path as production and avoids duplicating internal
behavior (_store, upsert_contributor, refresh_store_cache).
In `@backend/tests/test_reputation.py`:
- Around line 72-101: The test helper is manually mutating the service cache by
calling contributor_service.upsert_contributor(...) then setting
contributor_service._store[cid] = row, causing duplicated cache-management
logic; either update the service method upsert_contributor to populate the
internal cache (_store) itself so tests only call upsert_contributor, or remove
the manual assignment and extract the pattern into a shared test fixture (e.g.,
a helper used by tests that returns both row and ContributorResponse) so tests
stop directly touching contributor_service._store; locate usages around
upsert_contributor, contributor_service._store, ContributorResponse, and
ContributorStats to implement the fix.
---
Outside diff comments:
In `@backend/app/services/reputation_service.py`:
- Around line 394-415: The current flow appends to the in-memory
_reputation_store, updates the aggregate via
contributor_service.update_reputation_score, then asynchronously persists the
history with persist_reputation_entry, which can leave the aggregate out-of-sync
if the history write fails; change the order and/or use a single transactional
persist so history is durable before committing the aggregate: call
persist_reputation_entry(entry) first (or wrap both persist_reputation_entry and
contributor_service.update_reputation_score in a DB transaction), only update
_reputation_store and call
contributor_service.update_reputation_score(round(total, 2)) after the history
write succeeds, and handle exceptions by not updating the aggregate when
persist_reputation_entry raises; also ensure get_reputation still recomputes
from history consistently.
- Around line 503-522: The current loop calls get_reputation(contributor_id,
include_history=False) for each id, causing N repeated DB/history loads; instead
implement a bulk path: add a new bulk function (e.g., get_reputations_bulk or
extend get_reputation to accept a list) that takes all_ids and include_history
flag and returns a list of ReputationSummary objects by fetching contributor
rows and reputation histories in one query/pass, then replace the per-id loop in
the leaderboard code (which currently calls
contributor_service.list_contributor_ids and get_reputation) with a single call
to that bulk function and then sort by (-reputation_score, username) and slice
offset:offset+limit before returning.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 56bac5d9-afd5-4a5e-a13d-d3be3040edc8
📒 Files selected for processing (24)
backend/alembic.inibackend/alembic/__init__.pybackend/alembic/env.pybackend/alembic/script.py.makobackend/alembic/versions/001_create_contributors_table.pybackend/alembic/versions/__init__.pybackend/app/api/contributors.pybackend/app/api/health.pybackend/app/api/leaderboard.pybackend/app/database.pybackend/app/models/contributor.pybackend/app/models/leaderboard.pybackend/app/seed_leaderboard.pybackend/app/services/contributor_service.pybackend/app/services/github_sync.pybackend/app/services/leaderboard_service.pybackend/app/services/reputation_service.pybackend/requirements.txtbackend/scripts/__init__.pybackend/scripts/seed_contributors_from_github.pybackend/tests/conftest.pybackend/tests/test_contributors.pybackend/tests/test_leaderboard.pybackend/tests/test_reputation.py
backend/alembic.ini
Outdated
| <<<<<<< ours | ||
| [alembic] | ||
| script_location = migrations/alembic | ||
| # DB URL is overridden at runtime by env.py reading DATABASE_URL env var. | ||
| # Fallback only used when DATABASE_URL is not set. | ||
| sqlalchemy.url = postgresql+asyncpg://localhost/solfoundry | ||
| ======= | ||
| # Alembic configuration for SolFoundry backend. | ||
| # | ||
| # Manages database schema migrations for PostgreSQL. | ||
| # The connection URL is read from the DATABASE_URL environment variable | ||
| # at runtime (see alembic/env.py). | ||
|
|
||
| [alembic] | ||
| script_location = alembic | ||
| prepend_sys_path = . | ||
| sqlalchemy.url = postgresql+asyncpg://postgres:postgres@localhost/solfoundry | ||
|
|
||
| [loggers] | ||
| keys = root,sqlalchemy,alembic | ||
|
|
||
| [handlers] | ||
| keys = console | ||
|
|
||
| [formatters] | ||
| keys = generic | ||
|
|
||
| [logger_root] | ||
| level = WARN | ||
| handlers = console | ||
|
|
||
| [logger_sqlalchemy] | ||
| level = WARN | ||
| handlers = | ||
| qualname = sqlalchemy.engine | ||
|
|
||
| [logger_alembic] | ||
| level = INFO | ||
| handlers = | ||
| qualname = alembic | ||
|
|
||
| [handler_console] | ||
| class = StreamHandler | ||
| args = (sys.stderr,) | ||
| level = NOTSET | ||
| formatter = generic | ||
|
|
||
| [formatter_generic] | ||
| format = %(levelname)-5.5s [%(name)s] %(message)s | ||
| datefmt = %H:%M:%S | ||
| >>>>>>> theirs |
There was a problem hiding this comment.
Unresolved merge conflict markers break Alembic configuration.
The file contains Git merge conflict markers (<<<<<<< ours, =======, >>>>>>> theirs) that will cause Alembic to fail when parsing the INI file. These must be resolved before merging.
Additionally, line 17 contains hardcoded database credentials (postgres:postgres@localhost). While the docstring states the URL is overridden at runtime via DATABASE_URL, committing default credentials is a security anti-pattern—consider using a placeholder like driver://user:pass@localhost/dbname or removing the fallback entirely.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/alembic.ini` around lines 1 - 51, Resolve the Git conflict markers in
the alembic.ini by removing the "<<<<<<<", "=======", ">>>>>>>" blocks and
keeping a single coherent INI content under the [alembic] section (pick the
correct script_location value used by your project: either "migrations/alembic"
or "alembic"); also remove the hardcoded credentials in sqlalchemy.url (the
sqlalchemy.url key) and replace it with a safe placeholder or an empty fallback
(e.g., a driver://user:pass@host/dbname placeholder) so env.py can reliably read
DATABASE_URL at runtime, and ensure any logger/handler sections you keep are
valid INI syntax.
backend/alembic/env.py
Outdated
| database_url = os.getenv( | ||
| "DATABASE_URL", | ||
| "postgresql+asyncpg://postgres:postgres@localhost/solfoundry", | ||
| ) | ||
| if database_url.startswith("postgresql://") and "+asyncpg" not in database_url: | ||
| database_url = database_url.replace( | ||
| "postgresql://", "postgresql+asyncpg://", 1 | ||
| ) | ||
| config.set_main_option("sqlalchemy.url", database_url) |
There was a problem hiding this comment.
Hardcoded fallback credentials in migration environment.
The fallback DATABASE_URL on line 30 contains default credentials (postgres:postgres). While overridden at runtime, this poses a risk if the environment variable is accidentally unset in production. Consider failing explicitly when DATABASE_URL is missing in non-development environments, or use a clearly invalid placeholder.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/alembic/env.py` around lines 28 - 36, The current fallback
DATABASE_URL string with real-looking credentials is unsafe; update env.py so
database_url (the variable computed and passed to config.set_main_option) does
not default to "postgres:postgres". Instead, if os.getenv("DATABASE_URL") is
missing, either raise a clear runtime error (fail fast) unless in a dev mode
flag, or set a non-functional placeholder value; implement this check before the
startswith/replace logic so the code either raises an exception with a helpful
message or uses an explicit INVALID_DATABASE_URL placeholder when not in
development (use your existing environment indicator or add one) and then call
config.set_main_option("sqlalchemy.url", database_url).
| op.create_table( | ||
| "reputation_history", | ||
| sa.Column( | ||
| "id", | ||
| postgresql.UUID(as_uuid=True), | ||
| primary_key=True, | ||
| nullable=False, | ||
| ), | ||
| sa.Column( | ||
| "contributor_id", | ||
| postgresql.UUID(as_uuid=True), | ||
| sa.ForeignKey("contributors.id", ondelete="CASCADE"), | ||
| nullable=False, | ||
| index=True, | ||
| ), | ||
| sa.Column("bounty_id", postgresql.UUID(as_uuid=True), nullable=False), | ||
| sa.Column("bounty_title", sa.String(255), nullable=False), | ||
| sa.Column("bounty_tier", sa.Integer(), nullable=False), | ||
| sa.Column("review_score", sa.Float(), nullable=False), | ||
| sa.Column("earned_reputation", sa.Float(), nullable=False), | ||
| sa.Column("anti_farming_applied", sa.Boolean(), nullable=False, server_default=sa.text("false")), | ||
| sa.Column( | ||
| "created_at", | ||
| sa.DateTime(timezone=True), | ||
| nullable=True, | ||
| server_default=sa.func.now(), | ||
| ), | ||
| ) |
There was a problem hiding this comment.
Missing unique constraint on (contributor_id, bounty_id) in reputation_history.
The reputation_service implements idempotent duplicate handling in application code (returning existing entry if bounty_id matches). However, without a database-level unique constraint, concurrent requests could insert duplicate reputation records for the same bounty. Consider adding:
+ sa.UniqueConstraint("contributor_id", "bounty_id", name="uq_reputation_contributor_bounty"),This enforces idempotency at the database level and prevents race conditions.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| op.create_table( | |
| "reputation_history", | |
| sa.Column( | |
| "id", | |
| postgresql.UUID(as_uuid=True), | |
| primary_key=True, | |
| nullable=False, | |
| ), | |
| sa.Column( | |
| "contributor_id", | |
| postgresql.UUID(as_uuid=True), | |
| sa.ForeignKey("contributors.id", ondelete="CASCADE"), | |
| nullable=False, | |
| index=True, | |
| ), | |
| sa.Column("bounty_id", postgresql.UUID(as_uuid=True), nullable=False), | |
| sa.Column("bounty_title", sa.String(255), nullable=False), | |
| sa.Column("bounty_tier", sa.Integer(), nullable=False), | |
| sa.Column("review_score", sa.Float(), nullable=False), | |
| sa.Column("earned_reputation", sa.Float(), nullable=False), | |
| sa.Column("anti_farming_applied", sa.Boolean(), nullable=False, server_default=sa.text("false")), | |
| sa.Column( | |
| "created_at", | |
| sa.DateTime(timezone=True), | |
| nullable=True, | |
| server_default=sa.func.now(), | |
| ), | |
| ) | |
| op.create_table( | |
| "reputation_history", | |
| sa.Column( | |
| "id", | |
| postgresql.UUID(as_uuid=True), | |
| primary_key=True, | |
| nullable=False, | |
| ), | |
| sa.Column( | |
| "contributor_id", | |
| postgresql.UUID(as_uuid=True), | |
| sa.ForeignKey("contributors.id", ondelete="CASCADE"), | |
| nullable=False, | |
| index=True, | |
| ), | |
| sa.Column("bounty_id", postgresql.UUID(as_uuid=True), nullable=False), | |
| sa.Column("bounty_title", sa.String(255), nullable=False), | |
| sa.Column("bounty_tier", sa.Integer(), nullable=False), | |
| sa.Column("review_score", sa.Float(), nullable=False), | |
| sa.Column("earned_reputation", sa.Float(), nullable=False), | |
| sa.Column("anti_farming_applied", sa.Boolean(), nullable=False, server_default=sa.text("false")), | |
| sa.Column( | |
| "created_at", | |
| sa.DateTime(timezone=True), | |
| nullable=True, | |
| server_default=sa.func.now(), | |
| ), | |
| sa.UniqueConstraint("contributor_id", "bounty_id", name="uq_reputation_contributor_bounty"), | |
| ) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/alembic/versions/001_create_contributors_table.py` around lines 110 -
137, Add a database-level uniqueness constraint to the reputation_history table
to prevent duplicate rows for the same contributor and bounty: update the
op.create_table call for "reputation_history" (or add an
op.create_unique_constraint immediately after) to include a UniqueConstraint on
("contributor_id", "bounty_id") so the database enforces idempotency for
contributor_id + bounty_id and avoids race-condition inserts that the
reputation_service tries to deduplicate in application code.
| skill_list = skills.split(",") if skills else None | ||
| badge_list = badges.split(",") if badges else None |
There was a problem hiding this comment.
Comma-separated filters are forwarded without normalization.
Lines 65-66 split raw query strings but do not trim whitespace or drop empty tokens. Requests like ?skills=python, rust, become ["python", " rust", ""], which flows straight into the DB filters and can suppress otherwise valid matches. As per coding guidelines, backend/**: Analyze thoroughly: Input validation and SQL injection vectors.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/app/api/contributors.py` around lines 65 - 66, The current split of
query params into skill_list and badge_list returns raw tokens (e.g., ["python",
" rust", ""]) which can break DB filters; update the logic around skill_list and
badge_list in contributors.py to split on commas, trim whitespace from each
token, drop any empty tokens, and optionally normalize casing (e.g., lower())
before passing to DB; ensure you still use parameterized queries or ORM filters
to avoid injection risk when using these cleaned lists.
| author = pr.get("user", {}).get("login", "unknown") | ||
| avatar = pr.get("user", {}).get("avatar_url", "") | ||
|
|
||
| if author.endswith("[bot]") or author in ("dependabot", "github-actions"): | ||
| continue |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Bot filtering may miss edge cases.
The bot exclusion logic checks for [bot] suffix and exact matches for dependabot/github-actions. Consider:
- The
[bot]check is case-sensitive (endswith("[bot]")) — GitHub bot accounts may vary in casing - Other common bots like
renovate,codecov,snyk-botare not filtered
- if author.endswith("[bot]") or author in ("dependabot", "github-actions"):
+ if author.lower().endswith("[bot]") or author.lower() in (
+ "dependabot", "github-actions", "renovate", "codecov", "snyk-bot"
+ ):🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/scripts/seed_contributors_from_github.py` around lines 224 - 228, The
bot-filtering is brittle: update the logic around the author variable (where
avatar is also derived) to perform case-insensitive checks and broader bot
matching; e.g., compute author_lower = author.lower() and then skip when
author_lower.endswith("[bot]") or "bot" in author_lower or author_lower is in an
expanded set like
{"dependabot","github-actions","renovate","snyk-bot","codecov"} (or use a small
regex for common bot patterns) so that the existing check using
endswith("[bot]") and the tuple ("dependabot","github-actions") is replaced with
this normalized, extensible test.
| # Shared event loop for all tests that need synchronous async execution | ||
| _test_loop: asyncio.AbstractEventLoop = None # type: ignore | ||
|
|
||
|
|
||
| def get_test_loop() -> asyncio.AbstractEventLoop: | ||
| """Return the shared test event loop, creating it if needed. | ||
| This ensures all synchronous test helpers (``run_async``) use the | ||
| same event loop, avoiding 'no current event loop' errors when | ||
| running the full test suite. | ||
| Returns: | ||
| The shared asyncio event loop for tests. | ||
| """ | ||
| global _test_loop | ||
| if _test_loop is None or _test_loop.is_closed(): | ||
| _test_loop = asyncio.new_event_loop() | ||
| asyncio.set_event_loop(_test_loop) | ||
| return _test_loop |
There was a problem hiding this comment.
Custom event loop management may conflict with pytest-asyncio.
The custom _test_loop global with get_test_loop() duplicates functionality that pytest-asyncio provides via its event_loop fixture. This could lead to conflicts if async tests (@pytest.mark.asyncio) use a different loop than run_async() calls.
Additionally, line 22 has an incorrect type annotation: _test_loop: asyncio.AbstractEventLoop = None should be Optional[asyncio.AbstractEventLoop] = None.
Consider using pytest-asyncio's event_loop_policy fixture or scoped event_loop fixture instead of rolling your own loop management.
Fix type annotation
+from typing import Optional
+
# Shared event loop for all tests that need synchronous async execution
-_test_loop: asyncio.AbstractEventLoop = None # type: ignore
+_test_loop: Optional[asyncio.AbstractEventLoop] = None🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/tests/conftest.py` around lines 21 - 39, The custom global loop and
accessor (_test_loop and get_test_loop) duplicate pytest-asyncio behavior and
have a wrong type hint; change the annotation to
Optional[asyncio.AbstractEventLoop] for _test_loop and either remove this custom
loop entirely or replace it with a proper pytest fixture named event_loop that
returns a new event loop for tests (using asyncio.new_event_loop() and
yielding/closing it) so all async tests and helpers (e.g., run_async) use
pytest-asyncio's loop; update call sites of get_test_loop to use the pytest
event_loop fixture or accept the fixture via parameters and delete the global
state and set_event_loop calls to avoid conflicts.
| """Tests for contributor profiles API with PostgreSQL persistence. | ||
|
|
||
| Tests verify CRUD operations through the HTTP API layer. All service | ||
| functions are async (DB-first reads) and are exercised via the ASGI | ||
| event loop through TestClient. DB reads are stubbed out to prevent | ||
| cross-test contamination from shared SQLite in-memory databases. | ||
| Verifies that the contributor CRUD endpoints work correctly against | ||
| the async PostgreSQL-backed contributor service. Uses an in-memory | ||
| SQLite database for test isolation. | ||
| """ |
There was a problem hiding this comment.
Docstring incorrectly references SQLite.
The docstring states "Uses an in-memory SQLite database for test isolation" but the tests actually use the PostgreSQL engine from app.database. Update the docstring to reflect the actual implementation:
-Verifies that the contributor CRUD endpoints work correctly against
-the async PostgreSQL-backed contributor service. Uses an in-memory
-SQLite database for test isolation.
+Verifies that the contributor CRUD endpoints work correctly against
+the async PostgreSQL-backed contributor service. Uses the shared
+test database with per-test row deletion for isolation.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| """Tests for contributor profiles API with PostgreSQL persistence. | |
| Tests verify CRUD operations through the HTTP API layer. All service | |
| functions are async (DB-first reads) and are exercised via the ASGI | |
| event loop through TestClient. DB reads are stubbed out to prevent | |
| cross-test contamination from shared SQLite in-memory databases. | |
| Verifies that the contributor CRUD endpoints work correctly against | |
| the async PostgreSQL-backed contributor service. Uses an in-memory | |
| SQLite database for test isolation. | |
| """ | |
| """Tests for contributor profiles API with PostgreSQL persistence. | |
| Verifies that the contributor CRUD endpoints work correctly against | |
| the async PostgreSQL-backed contributor service. Uses the shared | |
| test database with per-test row deletion for isolation. | |
| """ |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/tests/test_contributors.py` around lines 1 - 6, Update the module
docstring in backend/tests/test_contributors.py to accurately describe the test
setup: replace the incorrect "Uses an in-memory SQLite database for test
isolation" with a statement that the tests use the PostgreSQL engine provided by
app.database (the async PostgreSQL-backed contributor service). Keep the rest of
the docstring intact but ensure it references the PostgreSQL engine from
app.database rather than SQLite.
| row = run_async(contributor_service.upsert_contributor(row_data)) | ||
| contributor_service._store[str(row.id)] = row | ||
| return row |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Manual _store population duplicates service behavior.
After calling upsert_contributor(), the test helper manually inserts the row into _store. This bypasses the service's refresh_store_cache() mechanism and could cause test/production behavior divergence if the service's caching logic changes. Consider calling refresh_store_cache() instead or ensuring the service handles cache population internally on upsert.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/tests/test_leaderboard.py` around lines 64 - 66, Remove the manual
cache write to contributor_service._store after calling upsert_contributor;
instead invoke the service's cache-refresh logic (call
contributor_service.refresh_store_cache() after
run_async(contributor_service.upsert_contributor(row_data))) so the test uses
the same cache-population path as production and avoids duplicating internal
behavior (_store, upsert_contributor, refresh_store_cache).
| row = run_async( | ||
| contributor_service.upsert_contributor( | ||
| { | ||
| "id": uuid.UUID(cid), | ||
| "username": username, | ||
| "display_name": username, | ||
| "skills": ["python"], | ||
| "badges": [], | ||
| "social_links": {}, | ||
| "total_contributions": 0, | ||
| "total_bounties_completed": 0, | ||
| "total_earnings": Decimal("0"), | ||
| "reputation_score": 0.0, | ||
| "created_at": now, | ||
| "updated_at": now, | ||
| } | ||
| ) | ||
| ) | ||
| contributor_service._store[cid] = row | ||
| return ContributorResponse( | ||
| id=cid, username=username, display_name=username, skills=["python"], | ||
| badges=[], social_links={}, stats=ContributorStats(), | ||
| created_at=now, updated_at=now) | ||
| id=cid, | ||
| username=username, | ||
| display_name=username, | ||
| skills=["python"], | ||
| badges=[], | ||
| social_links={}, | ||
| stats=ContributorStats(), | ||
| created_at=now, | ||
| updated_at=now, | ||
| ) |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Test helper duplicates _store cache management.
Similar to the leaderboard tests, _mc() manually inserts into _store after upserting. This pattern is repeated across test files and could be extracted into a shared fixture or the service could handle cache updates internally.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/tests/test_reputation.py` around lines 72 - 101, The test helper is
manually mutating the service cache by calling
contributor_service.upsert_contributor(...) then setting
contributor_service._store[cid] = row, causing duplicated cache-management
logic; either update the service method upsert_contributor to populate the
internal cache (_store) itself so tests only call upsert_contributor, or remove
the manual assignment and extract the pattern into a shared test fixture (e.g.,
a helper used by tests that returns both row and ContributorResponse) so tests
stop directly touching contributor_service._store; locate usages around
upsert_contributor, contributor_service._store, ContributorResponse, and
ContributorStats to implement the fix.
There was a problem hiding this comment.
Actionable comments posted: 9
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
backend/app/database.py (1)
97-118:⚠️ Potential issue | 🟠 MajorSchema bootstrap still fails open for DB-backed contributor features.
Lines 97-118 wrap the new contributor/reputation model imports and
Base.metadata.create_all()in a blanketexceptthat only logs a warning. After this migration,init_db()is building required persistence for contributor counting, leaderboard reads, seeding, and reputation writes, so a model-import or DDL failure should stop startup/tests instead of deferring the breakage to later requests.As per coding guidelines,
backend/**: "Analyze thoroughly: - Error handling and edge case coverage".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/database.py` around lines 97 - 118, The DB schema bootstrap currently swallows all exceptions around the model imports and await conn.run_sync(Base.metadata.create_all) (where ContributorTable, ReputationHistoryDB, ContributorTable, BountyTable, SubmissionDB, etc. are imported), which allows startup to continue with missing persistence; change this to fail fast by removing the blanket except Exception that only logs a warning and instead propagate the error (or log with logger.error and re-raise) so init_db() aborts startup/tests on import/DDL failures; alternatively catch only expected non-fatal errors and re-raise unexpected ones to ensure missing contributor/reputation tables stop startup.backend/app/services/reputation_service.py (2)
236-258:⚠️ Potential issue | 🟠 MajorTier gating and idempotency still depend on per-process cache state.
Lines 243-258 consult
_reputation_storefor duplicatebounty_iddetection, unlocked-tier checks, and anti-farming. In a multi-worker deployment or any path that serves requests before hydration finishes, different processes can see different history and accept duplicate records or higher-tier submissions that PostgreSQL would already show as ineligible.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/services/reputation_service.py` around lines 236 - 258, The current checks (duplicate bounty_id, allowed tier, anti-farming) rely on the in-memory _reputation_store and can be inconsistent across processes; replace those validations with authoritative DB queries/transactions: inside the critical section (or a DB transaction using contributor_service.get_contributor_db as needed) query PostgreSQL for existing reputation/history rows for data.contributor_id (and for an existing bounty_id) instead of reading _reputation_store, compute allowed tier using the persisted history (i.e., call _allowed_tier_for_contributor with the DB-derived history) and compute is_veteran from DB history as well, and enforce idempotency/tier checks based on those results (or use SELECT ... FOR UPDATE/unique constraints + handle conflicts) rather than the _reputation_store cache.
278-297:⚠️ Potential issue | 🔴 CriticalThe write path can acknowledge success without persisting the canonical event.
Lines 278-297 update the in-memory history and contributor aggregate before attempting
persist_reputation_entry(entry), then downgrade any history-write failure to a log line.get_reputation()later recomputes totals from history rows (Lines 321-353), so the API can return success even though the next DB-backed read/restart drops that reputation change.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/services/reputation_service.py` around lines 278 - 297, The current flow updates _reputation_store and calls contributor_service.update_reputation_score before persisting, which can cause in-memory state to diverge if persist_reputation_entry(entry) fails; change the flow so you call persist_reputation_entry(entry) first and only update _reputation_store and call contributor_service.update_reputation_score after a successful persist, or if you must update memory first then ensure you rollback the in-memory append and revert the contributor score when persist_reputation_entry raises; locate the logic around _reputation_store.setdefault(data.contributor_id, []).append(entry), the subsequent round(total, 2) update via contributor_service.update_reputation_score, and the persist_reputation_entry call and implement the transactional ordering or compensating rollback to ensure get_reputation() will reflect only persisted entries.backend/app/api/leaderboard.py (1)
39-107:⚠️ Potential issue | 🟠 MajorThe endpoint no longer exposes the structured leaderboard schema it asks the service to build.
backend/app/services/leaderboard_service.py:229-304still returns aLeaderboardResponse, but Lines 79-107 always collapse it into a bare array and discardperiod,total,offset,limit, andtop3. That is an API contract change for callers using the backend-styleperiodpath, and it conflicts with the PR objective to keep response schemas backward-compatible.As per coding guidelines,
backend/**: "Analyze thoroughly: - API contract consistency with spec".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/api/leaderboard.py` around lines 39 - 107, The current endpoint (function leaderboard) always returns a bare array built from result.entries and drops the service's LeaderboardResponse fields; update leaderboard to preserve the backend-style schema when callers use the backend param by checking whether the period argument was provided (i.e., period is not None) and, if so, return a JSONResponse containing the full structured payload: period, total, offset, limit, top3 (from the LeaderboardResponse returned by get_leaderboard) and the mapped contributors (built from result.entries); otherwise keep the existing frontend-friendly array response. Ensure you reference get_leaderboard, LeaderboardResponse, result.entries, and preserve all original fields (period, total, offset, limit, top3) in the response.
♻️ Duplicate comments (5)
backend/requirements.txt (1)
7-7:⚠️ Potential issue | 🟡 MinorDuplicate Alembic pin remains in the manifest.
Lines 7 and 20 declare the same
alembic>=1.13.0,<2.0.0constraint. Pip will tolerate it, but dependency updates now have two authoritative-looking locations for the same package, and this duplication was already reported on an earlier revision.Also applies to: 20-20
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/requirements.txt` at line 7, Remove the duplicate package constraint so the requirements manifest contains only a single entry for alembic (alembic>=1.13.0,<2.0.0); locate the second/extra occurrence of the exact string "alembic>=1.13.0,<2.0.0" in requirements.txt and delete it, ensuring no other duplicate package pins remain in the file.backend/app/api/health.py (1)
52-69:⚠️ Potential issue | 🟠 Major
/healthstill turns DB outages into 500s.After Lines 49-50 already determine the database is disconnected, Lines 57-58 still call
count_contributors(), andbackend/app/services/contributor_service.py:570-593executes a real SQL count. Any exception there aborts the endpoint instead of returning a degraded payload; the same block also importsbounty_service._bounty_store, so the health surface depends on a private in-memory implementation detail.As per coding guidelines,
backend/**: "Analyze thoroughly: - Error handling and edge case coverage".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/api/health.py` around lines 52 - 69, The health endpoint calls contributor_service.count_contributors() and directly references bounty_service._bounty_store even when db_status is "disconnected", which lets DB errors raise 500s and leaks a private implementation detail; change logic in the health handler to only call contributor_service.count_contributors() when db_status == "connected" (or wrap the call in a try/except and fall back to a safe default like None or 0 on error), and stop importing/using the private _bounty_store—use a public API (e.g., add/use a bounty_service.get_bounty_count() or similar) or compute a safe fallback count; ensure get_last_sync() remains unchanged but handle its None case as already done so the endpoint returns a degraded payload instead of raising on DB failures.backend/tests/conftest.py (1)
21-71:⚠️ Potential issue | 🟠 MajorThe shared test harness now leaks both DB state and loop state across the suite.
Lines 21-71 create one module-global event loop and one session-scoped SQLite schema, and
backend/app/database.py:36-44usesStaticPoolfor SQLite. That means every test shares the same in-memory database, so persisted contributor/leaderboard state bleeds between tests, while pytest-asyncio tests can still run on a different loop thanrun_async(), which is risky for module-level asyncio primitives such asbackend/app/services/reputation_service.py:_reputation_lock.Expected result: the search should show both
run_async(...)call sites and async tests, confirming mixed loop strategies over a single shared SQLite database.#!/bin/bash set -euo pipefail echo "=== shared loop helper in conftest ===" CONFTEST_FILE="$(fd '^conftest\.py$' backend/tests | head -n1)" sed -n '18,80p' "$CONFTEST_FILE" echo echo "=== SQLite engine configuration ===" DB_FILE="$(fd '^database\.py$' backend/app | head -n1)" sed -n '28,46p' "$DB_FILE" echo echo "=== async tests and synchronous run_async call sites ===" rg -n -C2 '@pytest\.mark\.asyncio|async def test_|run_async\(' backend/tests --type py🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/tests/conftest.py` around lines 21 - 71, The conftest helpers leak a single module-global event loop and a session-scoped in-memory SQLite schema (symbols: _test_loop, get_test_loop, run_async, init_test_db) causing DB/state and loop-sharing across tests; fix by removing the global _test_loop and run_async wrapper, rely on pytest-asyncio's event_loop fixture for async tests (or make run_async obtain the current loop via asyncio.get_event_loop()), and change init_test_db to create a fresh database per test (function scope) or stop using StaticPool in app.database so each test gets its own in-memory DB; also ensure module-level asyncio primitives like _reputation_lock are instantiated per-loop or lazily inside functions to avoid cross-test loop/lock reuse.backend/app/seed_leaderboard.py (1)
100-106:⚠️ Potential issue | 🔴 Critical
seed_leaderboard()still nests an event loop inside an async startup path.Lines 100-106 call
asyncio.get_event_loop().run_until_complete(...), while the docstring says this wrapper is used frommain.pylifespan. If the current call site is still an async lifespan or async seeding function, this raises at runtime before any contributor rows are seeded.Expected result: at least one
seed_leaderboard()call appears inside an async lifecycle or seeding function; if so, the synchronous wrapper is unsafe.#!/bin/bash set -euo pipefail echo "=== seed_leaderboard definition ===" SEED_FILE="$(fd '^seed_leaderboard\.py$' backend | head -n1)" sed -n '96,110p' "$SEED_FILE" echo echo "=== seed_leaderboard call sites ===" rg -n -C3 '\bseed_leaderboard\s*\(' backend --type py echo echo "=== async startup / seeding contexts ===" rg -n -C3 'async def .*lifespan|@asynccontextmanager|async def .*seed_all' backend --type py🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/seed_leaderboard.py` around lines 100 - 106, seed_leaderboard() is a synchronous wrapper that calls asyncio.get_event_loop().run_until_complete(async_seed_leaderboard()), which is unsafe when invoked from an async lifespan/startup; remove this nested event-loop pattern by eliminating the sync wrapper and updating call sites to await async_seed_leaderboard() directly (e.g., replace usages in main.py lifespan or any async seed functions with "await async_seed_leaderboard()"); if a truly synchronous entrypoint is required, provide a safe alternative that checks for a running loop and uses asyncio.run(async_seed_leaderboard()) only when no loop is running or else raises a clear error—reference the functions seed_leaderboard and async_seed_leaderboard and the main.py lifespan call sites when applying the change.backend/alembic/versions/001_create_contributors_table.py (1)
110-137:⚠️ Potential issue | 🟠 MajorMissing unique constraint on
(contributor_id, bounty_id)allows duplicate reputation entries.Without a database-level unique constraint, concurrent requests could insert duplicate reputation records for the same bounty, even if application code attempts deduplication.
— This was flagged in a prior review. Add:
sa.UniqueConstraint("contributor_id", "bounty_id", name="uq_reputation_contributor_bounty"),🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/alembic/versions/001_create_contributors_table.py` around lines 110 - 137, Add a DB-level unique constraint to the reputation table to prevent duplicate entries by modifying the op.create_table call for "reputation_history": include sa.UniqueConstraint("contributor_id", "bounty_id", name="uq_reputation_contributor_bounty") as one of the table elements in the op.create_table invocation so the reputation_history table enforces uniqueness on (contributor_id, bounty_id) at the database level.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/alembic/env.py`:
- Around line 17-18: Alembic's autogenerate isn't seeing the reputation_history
table because only ContributorTable is imported into env.py; import the
ReputationHistoryDB model as well so target_metadata = Base.metadata includes
its schema. Add an import for ReputationHistoryDB (the model/class named
ReputationHistoryDB) alongside the existing from app.models.contributor import
ContributorTable so Alembic can detect and include future schema changes for
reputation_history.
In `@backend/alembic/versions/001_create_contributors_table.py`:
- Line 125: Add a database index on reputation_history.bounty_id to avoid full
table scans when queries filter by bounty_id; update the migration (in the
revision that defines reputation_history) to create an index for the bounty_id
column (e.g., using op.create_index with a name like
ix_reputation_history_bounty_id) and add the corresponding op.drop_index in the
downgrade to cleanly roll back; ensure the index creation references the
reputation_history table and the bounty_id column.
- Around line 104-108: Update the index creation for
ix_contributors_reputation_earnings so the physical index matches the query
ORDER BY used in leaderboard_service (total_earnings DESC, reputation_score
DESC): change the op.create_index call in the migration to create the index with
explicit DESC ordering for both columns using PostgreSQL-aware expressions
(e.g., pass column expressions like sa.text("total_earnings DESC") and
sa.text("reputation_score DESC") or use the Postgres-specific options) so the
index on the contributors table matches the DESC order used in
backend/app/services/leaderboard_service.py.
In `@backend/app/services/github_sync.py`:
- Around line 393-396: The function-level imports of contributor_service and
Decimal inside sync_contributors are acceptable to avoid circular import
problems but lack an explanatory note; add a short comment directly above the
in-function imports in sync_contributors explaining that these imports are
intentionally local to prevent circular imports (mention contributor_service and
Decimal by name) so future maintainers understand the reason and do not move
them to module scope.
In `@backend/app/services/leaderboard_service.py`:
- Around line 195-207: The category filter contains a redundant dialect check —
remove the if/else branching around db_session.bind.dialect.name and apply a
single query.where using cast(ContributorTable.skills,
String).like(f'%"{category.value}"%') so the filter logic is not duplicated;
update the code that currently references category, db_session,
ContributorTable.skills, and query.where to use one unified branch.
- Around line 179-193: Redundant dialect check: remove the
db_session.bind.dialect.name == "postgresql" conditional and collapse both
branches into a single statement that applies the badge filter; specifically, in
the code that builds tier_label and filters ContributorTable.badges, replace the
if/else block with one query = query.where(cast(ContributorTable.badges,
String).like(f'%"{tier_label}"%')) so only that single expression remains (refer
to tier_label, db_session.bind, ContributorTable.badges, and the
cast(...).like(...) call).
In `@backend/app/services/reputation_service.py`:
- Around line 44-47: The current hydration only merges when load_reputation()
returns truthy, leaving stale in-memory entries when the DB is empty; change the
logic in the hydration block (where load_reputation(), _reputation_lock, and
_reputation_store are used) to always update the in-memory store under
_reputation_lock: acquire _reputation_lock, clear or reset _reputation_store,
then update it with the loaded payload (treat None as an empty mapping) so an
empty DB state replaces prior in-memory state rather than preserving it.
In `@backend/scripts/seed_contributors_from_github.py`:
- Around line 29-30: The script currently mutates sys.path via
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..")), which is
fragile; remove this sys.path manipulation and switch to running the script as a
proper module or using package-relative imports. Replace the ad-hoc path hack by
restoring normal imports in seed_contributors_from_github.py (remove the
sys.path.insert call) and update how it is invoked (e.g., run with python -m
backend.scripts.seed_contributors_from_github from the project root or adjust
imports to use the package namespace) so the code relies on the package import
system rather than modifying sys.path at runtime. Ensure any import lines in the
script use the package-qualified names referenced by the repository instead of
relative filesystem hacks.
In `@backend/tests/test_leaderboard.py`:
- Around line 257-269: The timing assertion in
test_leaderboard_under_100ms_with_cache is flaky; change the test to be more
CI-tolerant by raising the threshold (e.g., from 100ms to 200ms) and adding a
small retry loop that runs run_async(get_leaderboard()) up to 3 times (with a
short sleep) and uses the minimum elapsed_ms for the assertion; keep references
to the test function name test_leaderboard_under_100ms_with_cache and the
get_leaderboard/run_async calls so the change is localized.
---
Outside diff comments:
In `@backend/app/api/leaderboard.py`:
- Around line 39-107: The current endpoint (function leaderboard) always returns
a bare array built from result.entries and drops the service's
LeaderboardResponse fields; update leaderboard to preserve the backend-style
schema when callers use the backend param by checking whether the period
argument was provided (i.e., period is not None) and, if so, return a
JSONResponse containing the full structured payload: period, total, offset,
limit, top3 (from the LeaderboardResponse returned by get_leaderboard) and the
mapped contributors (built from result.entries); otherwise keep the existing
frontend-friendly array response. Ensure you reference get_leaderboard,
LeaderboardResponse, result.entries, and preserve all original fields (period,
total, offset, limit, top3) in the response.
In `@backend/app/database.py`:
- Around line 97-118: The DB schema bootstrap currently swallows all exceptions
around the model imports and await conn.run_sync(Base.metadata.create_all)
(where ContributorTable, ReputationHistoryDB, ContributorTable, BountyTable,
SubmissionDB, etc. are imported), which allows startup to continue with missing
persistence; change this to fail fast by removing the blanket except Exception
that only logs a warning and instead propagate the error (or log with
logger.error and re-raise) so init_db() aborts startup/tests on import/DDL
failures; alternatively catch only expected non-fatal errors and re-raise
unexpected ones to ensure missing contributor/reputation tables stop startup.
In `@backend/app/services/reputation_service.py`:
- Around line 236-258: The current checks (duplicate bounty_id, allowed tier,
anti-farming) rely on the in-memory _reputation_store and can be inconsistent
across processes; replace those validations with authoritative DB
queries/transactions: inside the critical section (or a DB transaction using
contributor_service.get_contributor_db as needed) query PostgreSQL for existing
reputation/history rows for data.contributor_id (and for an existing bounty_id)
instead of reading _reputation_store, compute allowed tier using the persisted
history (i.e., call _allowed_tier_for_contributor with the DB-derived history)
and compute is_veteran from DB history as well, and enforce idempotency/tier
checks based on those results (or use SELECT ... FOR UPDATE/unique constraints +
handle conflicts) rather than the _reputation_store cache.
- Around line 278-297: The current flow updates _reputation_store and calls
contributor_service.update_reputation_score before persisting, which can cause
in-memory state to diverge if persist_reputation_entry(entry) fails; change the
flow so you call persist_reputation_entry(entry) first and only update
_reputation_store and call contributor_service.update_reputation_score after a
successful persist, or if you must update memory first then ensure you rollback
the in-memory append and revert the contributor score when
persist_reputation_entry raises; locate the logic around
_reputation_store.setdefault(data.contributor_id, []).append(entry), the
subsequent round(total, 2) update via
contributor_service.update_reputation_score, and the persist_reputation_entry
call and implement the transactional ordering or compensating rollback to ensure
get_reputation() will reflect only persisted entries.
---
Duplicate comments:
In `@backend/alembic/versions/001_create_contributors_table.py`:
- Around line 110-137: Add a DB-level unique constraint to the reputation table
to prevent duplicate entries by modifying the op.create_table call for
"reputation_history": include sa.UniqueConstraint("contributor_id", "bounty_id",
name="uq_reputation_contributor_bounty") as one of the table elements in the
op.create_table invocation so the reputation_history table enforces uniqueness
on (contributor_id, bounty_id) at the database level.
In `@backend/app/api/health.py`:
- Around line 52-69: The health endpoint calls
contributor_service.count_contributors() and directly references
bounty_service._bounty_store even when db_status is "disconnected", which lets
DB errors raise 500s and leaks a private implementation detail; change logic in
the health handler to only call contributor_service.count_contributors() when
db_status == "connected" (or wrap the call in a try/except and fall back to a
safe default like None or 0 on error), and stop importing/using the private
_bounty_store—use a public API (e.g., add/use a
bounty_service.get_bounty_count() or similar) or compute a safe fallback count;
ensure get_last_sync() remains unchanged but handle its None case as already
done so the endpoint returns a degraded payload instead of raising on DB
failures.
In `@backend/app/seed_leaderboard.py`:
- Around line 100-106: seed_leaderboard() is a synchronous wrapper that calls
asyncio.get_event_loop().run_until_complete(async_seed_leaderboard()), which is
unsafe when invoked from an async lifespan/startup; remove this nested
event-loop pattern by eliminating the sync wrapper and updating call sites to
await async_seed_leaderboard() directly (e.g., replace usages in main.py
lifespan or any async seed functions with "await async_seed_leaderboard()"); if
a truly synchronous entrypoint is required, provide a safe alternative that
checks for a running loop and uses asyncio.run(async_seed_leaderboard()) only
when no loop is running or else raises a clear error—reference the functions
seed_leaderboard and async_seed_leaderboard and the main.py lifespan call sites
when applying the change.
In `@backend/requirements.txt`:
- Line 7: Remove the duplicate package constraint so the requirements manifest
contains only a single entry for alembic (alembic>=1.13.0,<2.0.0); locate the
second/extra occurrence of the exact string "alembic>=1.13.0,<2.0.0" in
requirements.txt and delete it, ensuring no other duplicate package pins remain
in the file.
In `@backend/tests/conftest.py`:
- Around line 21-71: The conftest helpers leak a single module-global event loop
and a session-scoped in-memory SQLite schema (symbols: _test_loop,
get_test_loop, run_async, init_test_db) causing DB/state and loop-sharing across
tests; fix by removing the global _test_loop and run_async wrapper, rely on
pytest-asyncio's event_loop fixture for async tests (or make run_async obtain
the current loop via asyncio.get_event_loop()), and change init_test_db to
create a fresh database per test (function scope) or stop using StaticPool in
app.database so each test gets its own in-memory DB; also ensure module-level
asyncio primitives like _reputation_lock are instantiated per-loop or lazily
inside functions to avoid cross-test loop/lock reuse.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 5a07b5ac-fb57-4301-90fe-c55d08d4e017
📒 Files selected for processing (24)
backend/alembic.inibackend/alembic/__init__.pybackend/alembic/env.pybackend/alembic/script.py.makobackend/alembic/versions/001_create_contributors_table.pybackend/alembic/versions/__init__.pybackend/app/api/contributors.pybackend/app/api/health.pybackend/app/api/leaderboard.pybackend/app/database.pybackend/app/models/contributor.pybackend/app/models/leaderboard.pybackend/app/seed_leaderboard.pybackend/app/services/contributor_service.pybackend/app/services/github_sync.pybackend/app/services/leaderboard_service.pybackend/app/services/reputation_service.pybackend/requirements.txtbackend/scripts/__init__.pybackend/scripts/seed_contributors_from_github.pybackend/tests/conftest.pybackend/tests/test_contributors.pybackend/tests/test_leaderboard.pybackend/tests/test_reputation.py
backend/alembic/env.py
Outdated
| # Import all models so Alembic can detect them | ||
| from app.models.contributor import ContributorTable # noqa: F401 |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check if ReputationHistoryDB exists in contributor.py and uses Base
rg -n "class ReputationHistoryDB" backend/app/models/Repository: SolFoundry/solfoundry
Length of output: 134
🏁 Script executed:
# Check current imports in env.py around lines 17-18
head -30 backend/alembic/env.py | cat -nRepository: SolFoundry/solfoundry
Length of output: 1128
🏁 Script executed:
# Check if there are other model imports beyond what's shown
rg -n "from app.models" backend/alembic/env.pyRepository: SolFoundry/solfoundry
Length of output: 133
🏁 Script executed:
# Check if ReputationHistoryDB and ContributorTable both inherit from Base
rg -A 5 "class (ContributorTable|ReputationHistoryDB)" backend/app/models/contributor.pyRepository: SolFoundry/solfoundry
Length of output: 568
🏁 Script executed:
# Check migration files to see what tables are created
find backend/alembic/versions -name "*.py" -type f | head -5 | xargs grep -l "reputation_history" 2>/dev/null || echo "No migration files found mentioning reputation_history"Repository: SolFoundry/solfoundry
Length of output: 122
🏁 Script executed:
# Check the migration file to see what tables it creates
cat -n backend/alembic/versions/001_create_contributors_table.pyRepository: SolFoundry/solfoundry
Length of output: 5451
🏁 Script executed:
# Check if there are any other migration files beyond the first one
find backend/alembic/versions -name "*.py" -type f | sortRepository: SolFoundry/solfoundry
Length of output: 159
🏁 Script executed:
# Check if any of those migrations use autogenerate pattern (look for `@revision` markers or revision info)
rg -l "autogenerate" backend/alembic/versions/ 2>/dev/null || echo "No autogenerate marker found"Repository: SolFoundry/solfoundry
Length of output: 93
🏁 Script executed:
# Verify both models are defined with Base
rg -B 1 "class (ContributorTable|ReputationHistoryDB)" backend/app/models/contributor.pyRepository: SolFoundry/solfoundry
Length of output: 132
Import ReputationHistoryDB so Alembic autogenerate detects schema changes.
The migration creates a reputation_history table, but only ContributorTable is imported for Alembic's metadata detection. Since target_metadata = Base.metadata relies on imported models to populate the schema, ReputationHistoryDB must be imported for Alembic's autogenerate command to detect future changes to the reputation_history table:
# Import all models so Alembic can detect them
from app.models.contributor import ContributorTable # noqa: F401
+from app.models.contributor import ReputationHistoryDB # noqa: F401📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Import all models so Alembic can detect them | |
| from app.models.contributor import ContributorTable # noqa: F401 | |
| # Import all models so Alembic can detect them | |
| from app.models.contributor import ContributorTable # noqa: F401 | |
| from app.models.contributor import ReputationHistoryDB # noqa: F401 |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/alembic/env.py` around lines 17 - 18, Alembic's autogenerate isn't
seeing the reputation_history table because only ContributorTable is imported
into env.py; import the ReputationHistoryDB model as well so target_metadata =
Base.metadata includes its schema. Add an import for ReputationHistoryDB (the
model/class named ReputationHistoryDB) alongside the existing from
app.models.contributor import ContributorTable so Alembic can detect and include
future schema changes for reputation_history.
| op.create_index( | ||
| "ix_contributors_reputation_earnings", | ||
| "contributors", | ||
| ["total_earnings", "reputation_score"], | ||
| ) |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check how leaderboard queries order results
rg -n "order_by" backend/app/services/leaderboard_service.pyRepository: SolFoundry/solfoundry
Length of output: 100
🏁 Script executed:
# Read the leaderboard_service.py file around line 209 to see the actual order_by clause
sed -n '200,220p' backend/app/services/leaderboard_service.pyRepository: SolFoundry/solfoundry
Length of output: 664
🏁 Script executed:
# Also search for any DESC patterns in leaderboard-related queries
rg -n "DESC|desc" backend/app/services/leaderboard_service.pyRepository: SolFoundry/solfoundry
Length of output: 248
🏁 Script executed:
# Check the full leaderboard_service.py file to see all order_by usages
rg -B 2 -A 2 "order_by" backend/app/services/leaderboard_service.pyRepository: SolFoundry/solfoundry
Length of output: 221
🏁 Script executed:
# Check if there are other leaderboard-related queries that might have different ordering
rg -n "leaderboard" backend/app/services/ -i --type pyRepository: SolFoundry/solfoundry
Length of output: 2844
🏁 Script executed:
# Check the migration file to see the exact index definition
cat -n backend/alembic/versions/001_create_contributors_table.py | sed -n '100,115p'Repository: SolFoundry/solfoundry
Length of output: 627
Index column order does not match query pattern — consider explicit DESC ordering.
The index ix_contributors_reputation_earnings is created with default ascending order on ["total_earnings", "reputation_score"], but the leaderboard queries in backend/app/services/leaderboard_service.py (lines 209–211) use ORDER BY total_earnings DESC, reputation_score DESC. While PostgreSQL can use ascending indexes for descending queries via backward scans, this is less efficient than having the index match the query order.
The migration already documents the intended pattern in its comment; align the index definition with explicit DESC ordering using Alembic's PostgreSQL-specific options for optimal query performance.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/alembic/versions/001_create_contributors_table.py` around lines 104 -
108, Update the index creation for ix_contributors_reputation_earnings so the
physical index matches the query ORDER BY used in leaderboard_service
(total_earnings DESC, reputation_score DESC): change the op.create_index call in
the migration to create the index with explicit DESC ordering for both columns
using PostgreSQL-aware expressions (e.g., pass column expressions like
sa.text("total_earnings DESC") and sa.text("reputation_score DESC") or use the
Postgres-specific options) so the index on the contributors table matches the
DESC order used in backend/app/services/leaderboard_service.py.
| nullable=False, | ||
| index=True, | ||
| ), | ||
| sa.Column("bounty_id", postgresql.UUID(as_uuid=True), nullable=False), |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Consider adding index on reputation_history.bounty_id for lookups.
The bounty_id column is used for idempotency checks (finding existing entries by bounty), but lacks an index. Queries filtering by bounty_id will require full table scans.
- sa.Column("bounty_id", postgresql.UUID(as_uuid=True), nullable=False),
+ sa.Column("bounty_id", postgresql.UUID(as_uuid=True), nullable=False, index=True),📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| sa.Column("bounty_id", postgresql.UUID(as_uuid=True), nullable=False), | |
| sa.Column("bounty_id", postgresql.UUID(as_uuid=True), nullable=False, index=True), |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/alembic/versions/001_create_contributors_table.py` at line 125, Add a
database index on reputation_history.bounty_id to avoid full table scans when
queries filter by bounty_id; update the migration (in the revision that defines
reputation_history) to create an index for the bounty_id column (e.g., using
op.create_index with a name like ix_reputation_history_bounty_id) and add the
corresponding op.drop_index in the downgrade to cleanly roll back; ensure the
index creation references the reputation_history table and the bounty_id column.
| """Sync merged PRs + known payouts → PostgreSQL + in-memory cache.""" | ||
| from app.services import contributor_service | ||
| from decimal import Decimal | ||
| import uuid |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Function-level imports are acceptable but unconventional.
The imports for contributor_service and Decimal inside sync_contributors() avoid circular import issues but deviate from PEP 8 conventions. This is acceptable given the module's structure, but consider adding a brief comment explaining why.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/app/services/github_sync.py` around lines 393 - 396, The
function-level imports of contributor_service and Decimal inside
sync_contributors are acceptable to avoid circular import problems but lack an
explanatory note; add a short comment directly above the in-function imports in
sync_contributors explaining that these imports are intentionally local to
prevent circular imports (mention contributor_service and Decimal by name) so
future maintainers understand the reason and do not move them to module scope.
| if category: | ||
| if db_session.bind and db_session.bind.dialect.name == "postgresql": | ||
| query = query.where( | ||
| cast(ContributorTable.skills, String).like( | ||
| f'%"{category.value}"%' | ||
| ) | ||
| ) | ||
| else: | ||
| query = query.where( | ||
| cast(ContributorTable.skills, String).like( | ||
| f'%"{category.value}"%' | ||
| ) | ||
| ) |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Same redundant dialect check for category filter.
Lines 196-207 duplicate the same pattern: identical if/else branches. Simplify by removing the dialect check.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/app/services/leaderboard_service.py` around lines 195 - 207, The
category filter contains a redundant dialect check — remove the if/else
branching around db_session.bind.dialect.name and apply a single query.where
using cast(ContributorTable.skills, String).like(f'%"{category.value}"%') so the
filter logic is not duplicated; update the code that currently references
category, db_session, ContributorTable.skills, and query.where to use one
unified branch.
| loaded = await load_reputation() | ||
| if loaded: | ||
| with _reputation_lock: | ||
| async with _reputation_lock: | ||
| _reputation_store.update(loaded) |
There was a problem hiding this comment.
Hydration never removes stale reputation history.
Lines 44-47 only merge load_reputation() results into _reputation_store, and they skip the update entirely when the DB returns an empty payload. A second hydration after deletions—or any empty DB state—leaves old contributors/history resident in memory even though PostgreSQL is supposed to be the source of truth.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/app/services/reputation_service.py` around lines 44 - 47, The current
hydration only merges when load_reputation() returns truthy, leaving stale
in-memory entries when the DB is empty; change the logic in the hydration block
(where load_reputation(), _reputation_lock, and _reputation_store are used) to
always update the in-memory store under _reputation_lock: acquire
_reputation_lock, clear or reset _reputation_store, then update it with the
loaded payload (treat None as an empty mapping) so an empty DB state replaces
prior in-memory state rather than preserving it.
| # Ensure the backend package is importable | ||
| sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..")) |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
sys.path manipulation for imports is fragile.
Inserting .. into sys.path is a common workaround but can break if the script is invoked from unexpected directories. Consider using a proper package structure or running via python -m scripts.seed_contributors_from_github from the backend directory as documented.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/scripts/seed_contributors_from_github.py` around lines 29 - 30, The
script currently mutates sys.path via sys.path.insert(0,
os.path.join(os.path.dirname(__file__), "..")), which is fragile; remove this
sys.path manipulation and switch to running the script as a proper module or
using package-relative imports. Replace the ad-hoc path hack by restoring normal
imports in seed_contributors_from_github.py (remove the sys.path.insert call)
and update how it is invoked (e.g., run with python -m
backend.scripts.seed_contributors_from_github from the project root or adjust
imports to use the package namespace) so the code relies on the package import
system rather than modifying sys.path at runtime. Ensure any import lines in the
script use the package-qualified names referenced by the repository instead of
relative filesystem hacks.
| def test_leaderboard_under_100ms_with_cache(): | ||
| """Cached leaderboard response returns within 100ms target.""" | ||
| for i in range(10): | ||
| _seed_contributor( | ||
| f"perf{i}", f"Perf {i}", total_earnings=float(100 * i) | ||
| ) | ||
| run_async(get_leaderboard()) # warm the cache | ||
| start = time.time() | ||
| run_async(get_leaderboard()) | ||
| elapsed_ms = (time.time() - start) * 1000 | ||
| assert elapsed_ms < 100, ( | ||
| f"Cached leaderboard took {elapsed_ms:.1f}ms (target <100ms)" | ||
| ) |
There was a problem hiding this comment.
Timing-based test may be flaky in CI environments.
The assertion elapsed_ms < 100 depends on system performance. In resource-constrained CI runners, this could fail spuriously. Consider:
- Increasing the threshold with margin (e.g., 200ms)
- Using a retry mechanism
- Marking as a performance benchmark rather than hard assertion
- assert elapsed_ms < 100, (
+ # Allow headroom for CI variability; actual target is <100ms
+ assert elapsed_ms < 200, (
f"Cached leaderboard took {elapsed_ms:.1f}ms (target <100ms)"
)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/tests/test_leaderboard.py` around lines 257 - 269, The timing
assertion in test_leaderboard_under_100ms_with_cache is flaky; change the test
to be more CI-tolerant by raising the threshold (e.g., from 100ms to 200ms) and
adding a small retry loop that runs run_async(get_leaderboard()) up to 3 times
(with a short sleep) and uses the minimum elapsed_ms for the assertion; keep
references to the test function name test_leaderboard_under_100ms_with_cache and
the get_leaderboard/run_async calls so the change is localized.
9c72bbd to
a414e79
Compare
❌ Multi-LLM Code Review — REQUEST CHANGESAggregated Score: 5.5/10 (median of 5 models) Model Verdicts
Category Scores (Median)
Warning: CodeRabbit Critical IssuesCodeRabbit flagged 4 critical issues that appear unresolved. Warning: Bounty Spec Compliance: MINIMALMultiple models flagged that this submission does not fully meet the bounty acceptance criteria. SummarySonnet 4.6: This submission has fundamental blocking issues with unresolved merge conflict markers in critical files that prevent the application from running. While the overall architecture and PostgreSQL migration approach is sound, the presence of Git conflict markers in alembic.ini, contributors.py, and reputation_service.py makes this submission non-functional. Issues
Suggestions
Contributor stats: 6 merged bounty PRs, rep score 53 SolFoundry Multi-LLM Review Pipeline v3.0 — GPT-5.4 + Gemini 2.5 Pro + Grok 4 + Sonnet 4.6 + DeepSeek V3.2 Next StepsPlease address the issues above and push updated commits. The review will re-run automatically. |
Description
This PR migrates the contributor and leaderboard services from an in-memory storage MVP to full PostgreSQL persistence utilizing
asyncpgandSQLAlchemy.Specifically, this PR implements the following:
contributor_serviceto utilize async PostgreSQL sessions, guaranteeing consistency.leaderboard_serviceto query ranks via the database instead of in-memory caches, with an added TTL cache layer to maintain<100msperformance queries.001_create_contributors_table) to build missing database tables safely.Pydanticresponse schemas backwards-compatible so no frontend changes are required.Closes #186
Solana Wallet for Payout
Wallet: 4QhseKvBuaCQhdkP248iXoUxohPzVC5m8pE9hAv4nMYw
Type of Change
Checklist
console.logor debugging code left behindTesting
Screenshots (if applicable)
N/A